Importance sampling in reinforcement learning with an estimated behavior policy
نویسندگان
چکیده
Abstract In reinforcement learning, importance sampling is a widely used method for evaluating an expectation under the distribution of data one policy when has in fact been generated by different policy. Importance requires computing likelihood ratio between action probabilities target and those data-producing behavior this article, we study where are replaced their maximum estimate these observed data. We show general technique reduces variance due to error Monte Carlo style estimators. introduce two novel estimators that use expected values arise RL literature. find reduce methods, leading faster learning gradient algorithms more accurate off-policy evaluation. also provide theoretical analysis showing our new consistent have asymptotically lower than
منابع مشابه
Importance Sampling for Reinforcement Learning with Multiple
This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms....
متن کاملTruncated Importance Sampling for Reinforcement Learning with Experience Replay
Reinforcement Learning (RL) is considered here as an adaptation technique of neural controllers of machines. The goal is to make Actor-Critic algorithms require less agent-environment interaction to obtain policies of the same quality, at the cost of additional background computations. We propose to achieve this goal in the spirit of experience replay. An estimation method of improvement direct...
متن کاملImportance sampling for reinforcement learning with multiple objectives
This thesis considers three complications that arise from applying reinforcement learning to a real-world application. In the process of using reinforcement learning to build an adaptive electronic market-maker, we find the sparsity of data, the partial observability of the domain, and the multiple objectives of the agent to cause serious problems for existing reinforcement learning algorithms....
متن کاملCompetitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling
The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is difficult to decide in advance which a...
متن کاملPolicy Learning by GA using Importance Sampling
The most difficult problem of applying GA to a policy learning is that interactions with the environment require much time to evaluate the individuals. In this paper, we propose a new approach to estimate the individual’s value using importance sampling. Importance sampling reuses the experiences obtained by some policy to estimate values of the other policies. The proposed technique cuts down ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2021
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-020-05938-9